• People
  • Research
  • Projects
  • Publications
  • Resources
ViCoS Lab

MUXAD
Multimodal Image Understanding for Explainable Anomaly Detection

basic research project
January 2025 - December 2027

Collaborating partners

  • University of Ljubljana, Faculty of Computer and Information Science

Funding

  • ARIS (J2-60055)

Researchers

Danijel Skočaj, PhD
Danijel Skočaj, PhD
Matej Kristan, PhD
Matej Kristan, PhD
Domen Tabernik, PhD
Domen Tabernik, PhD
Matic Fučka, MSc
Matic Fučka, MSc
Vitjan Zavrtanik, PhD
Vitjan Zavrtanik, PhD

Project overview

With the rapid advancements in artificial intelligence, particularly in computer vision and natural language processing, deep learning has enabled impressive performance across many tasks. However, fundamental challenges remain concerning AI’s depth of understanding and its ability to explain decisions. This project addresses these issues by focusing on anomaly detection in images through multimodal models that not only detect if and where something is anomalous but also understand and explain why.

The core objective is to integrate visual and linguistic information to tackle three key challenges in contemporary AI: semantic image understanding, multimodal image understanding, and multimodal explanations. The first research challenge, Semantic Image Understanding, targets the limitations of current anomaly detection methods by enhancing models’ ability to recognize complex logical and structural anomalies beyond surface-level defects. The second challenge, Multimodal Image Understanding, develops zero-shot anomaly detection approaches that leverage vision-language models without prior exposure to specific object classes, supplemented by textual descriptions of anomalies at both task and instance levels. The third challenge, Multimodal Explanations, focuses on enriching visual anomaly explanations with textual descriptions, improving the intuitiveness and transparency of the models.

In the first project period, work was concentrated on WP1 and WP4, while establishing the initial research line for WP2. The strongest outputs already include SALAD, published at ICCV 2025, and the complementary journal paper No Label Left Behind in the Journal of Intelligent Manufacturing. Supporting results additionally cover multimodal reasoning with large language models, difficulty assessment for anomaly-detection benchmarks with DIAD, and data-efficient few-shot detection with PyramidCore.

MUXAD aims to elevate anomaly detection to a new level by harnessing the power of multimodal AI, creating models that are not only accurate but also interpretable and explainable, marking a significant step toward transparent AI systems.

Expected contributions of the project are:

  • Enhanced semantic image understanding for detecting complex and logical anomalies beyond surface defects.
  • Development of zero-shot multimodal anomaly detection methods that combine visual and linguistic data without prior exposure to specific classes.
  • Creation of multimodal explanation techniques that combine visual anomaly localization with rich textual descriptions.
  • Application of the developed methods to manufacturing visual inspection and medical imaging interpretation.

Workpackages

  • Development of advanced methods for semantic image understanding aimed at detecting complex anomalies, including SALAD, the current flagship ICCV 2025 result of the project (WP1).
  • Creation of multimodal image understanding approaches integrating vision and language for zero-shot anomaly detection, including AnomalyVFM as an emerging high-visibility line and supporting workshop/conference-style outputs such as the paper Detekcija logičnih anomalij z uporabo velikih jezikovnih modelov (WP2).
  • Development of methods for generating multimodal explanations combining visual and textual descriptions of anomalies (WP3).
  • Application of the developed methods to real-world use cases in manufacturing visual inspection and medical imaging interpretation (WP4).

Project phases:

  • Year 1: Focus on local and global appearance learning, object composition learning, and dataset curation (WP1).
  • Year 2: Activities on zero-shot anomaly detection, text-based knowledge injection, text-based weakly labelled supervision, and manufacturing visual inspection (WP2, WP4).
  • Year 3: Focus on text-driven explanations and modeling uncertainty in vision-language models (WP3).

Software and method resources

Project-related software and datasets:

 
AnomalyVFM

Code for AnomalyVFM, a zero-shot anomaly detection framework based on vision foundation models.

 
SALAD

Code for SALAD, a semantics-aware logical anomaly detection method developed within MUXAD.

 
SuperSimpleNet

PyTorch implementation of SuperSimpleNet for fast and reliable surface defect detection across unsupervised and supervised settings.

Publications

  •  
    ObjectCore - Efficient Few-shot Logical Anomaly Detection using Object Representations
    Matic Fučka, Vitjan Zavrtanik and Danijel Skočaj
    IEEE / CVF Winter Conference on Applications of Computer Vision (WACV), 2026
  •  
    PyramidCore -- Feature Pyramids for Few-Shot Logical Anomaly Detection
    Matic Fučka, Vitjan Zavrtanik and Danijel Skočaj
    2026 IEEE 23rd Mediterranean Electrotechnical Conference (MELECON), 2026
  •  
    Detekcija logičnih anomalij z uporabo velikih jezikovnih modelov
    Matic Fučka and Danijel Skočaj
    ERK 2025, 2025
  •  
    Introducing DIAD: A Novel Metric for Assessing the Difficulty of Anomaly Detection Problems
    Jure Pahor and Danijel Skočaj
    ERK 2025, 2025
  •  
    No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes
    Blaž Rolih, Matic Fučka and Danijel Skočaj
    Journal of Intelligent Manufacturing, 2025
  •  
    SALAD -- Semantics-Aware Logical Anomaly Detection
    Matic Fučka, Vitjan Zavrtanik and Danijel Skočaj
    IEEE/CVF International Conference on Computer Vision (ICCV), 2025
  •  
    SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection
    Blaž Rolih, Matic Fučka and Danijel Skočaj
    Pattern Recognition: 27th International Conference, ICPR 2024, Springer, 2024
Faculty of Computer and Information Science

Visual Cognitive Systems Laboratory

University of Ljubljana

Faculty of Computer and Information Science

Večna pot 113
SI-1000 Ljubljana
Slovenia
Tel.: +386 1 479 8245